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ABSTRACT 

We present a set of cosmological simulations with radiative transfer in order to model the reionization 
history of the universe from z = 18 down to z = 6. Galaxy formation and the associated star formation 
are followed self-consistently with gas and dark matter dynamics using the RAMSES code, while radiative 
transfer is performed as a post-processing step using a moment-based method with Ml closure relation 
in the ATON code. 

The latter has been ported to a multiple Graphical Processing Units (GPU) architecture using 
the CUDA language together with the MPI library, resulting in an overall acceleration that allows 
us to tackle radiative transfer problems at a significantly higher resolution than previously reported: 
1024^ + 2 levels of refinement for the hydrodynamics adaptive grid and 1024^ for the radiative transfer 
Cartesian grid. We reach typical acceleration factor close to 100 x when compared to the CPU version, 
allowing us to perform 1/4 million time steps in less than 3000 GPU hours. 

We observe good convergence properties between our different resolution runs for various volume- 
and mass-averaged quantities such as neutral fraction, UV background and Thomson optical depth, 
as long as the effects of finite resolution on the star formation history are properly taken into account. 
We also show that the neutral fraction depends on the total mass density, in a way close to the 
predictions of photoionization equilibrium, as long as the effect of self-shielding are included in the 
background radiation model. Although our simulation suite has reached unprecedented mass and 
spatial resolution, we still fail at reproducing the z ~ 6 constraints on the neutral fraction of hydrogen 
and the intensity of the UV background. 

In order to account for unresolved density fiuctuations, we have modified our chemistry solver with 
a simple clumping factor model. Using our most spatially resolved simulation (12.5 Mpc/h with 
1024^ particles) to calibrate our subgrid model, we have resimulated our largest box (100 Mpc/h with 
1024^ particles) with the modified chemistry, successfully reproducing the observed level of neutral 
Hydrogen in the spectra of high redshift quasars. We however didn't reproduce (by a factor of 2) 
the average photoionization rate inferred from the same observations. We argue that this discrepancy 
could be partly explained by the fact that the average radiation intensity and the average neutral 
fraction depends on different regions of the gas density distribution, so that one quantity cannot be 
simply deduced from the other. 
Subject headings: cosmology- numerical simulations 



1. INTRODUCTION 

After self-gravity, hydr o dynaniics a nd radiative cool- 
ingfseee^. .Efstathiou et al.l (119851): iHernauist et al. 
(119911) : ICenI (jl992[ )7 lKatz et al.l (jl996r ): iBertschingerl 
(•1998]) among other historical references), radiative 
transfer has been included only recently in cosmological 
simulations of the formation of large scale structu re in 
the Un iverse (see e.g. Abel et al. (1999); Gnedin fc Abell 
(j2QQlD : [Ciardi et al.l 02 001): Razo umov et al.l (120021) an d 



more recently 



Im^ 



et al 



llliev et al.. (.2006a): PlVac 
iBaek et al.l (IIT 



Ced (j2007h : 
) . Among 

many different astrophysical problems that require a 
proper treatment of light propagation, cosmic reioniza- 
tion stands out as a particularly challenging one, because 
the ionizing radiation field plays a key role in the tran- 
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sition from the "dark ages" to the era of galaxy forma- 
tion : the chronometry and the geometry of the process 
is entirely related to the way matter and radiation inter- 
act. The proper numerical modeling of cosmic reioniza- 
tion represents a additional challenge, since it requires 
to capture a whole set of physical phenomena which are 
difficult to tackle on th eir own (see e.g. the review by 
iBarkana fc Loebl ()2001l )). In a nutshell, reionization can 
be described as " atoms being dissociated by UV photons 
emitted by stars formed in collapsed, self-gravitating ha- 
los". This requires to follow the dynamics of dark mat- 
ter and gas on large scale, cooling and star formation on 
galactic scales, the emission of ionizing radiation at mi- 
croscopic scales and finally, UV light propagation back to 
the cosmological scales. Because of this chain of causality 
involving many different cosmological fluids (dark mat- 
ter, gas, stars and photons), it is only recently that sig- 
nificant progresses were made in the field of cosmological 
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radiative transfer. 

Computer simulations of radiative transfer cover a 
wide range of techniques, most of them reviewed in 
iTrac fc GnedinI (^009) and with most implementations 
gathered in two sets of comparison papers (Iliev et al. 
l2QQ6bl |2QQ9[ ). Current cosmological radiative transfer 
codes successfully pass these rather academic tests, but 
it should be noted that only a few observational tests can 
be used as a probe to calibrate these rather complicate 
numerical tools. The first major constraint comes from 
quasars with the detection of Gunn-Peterson troughs and 
a decrease of the flux transmission in spectra of objects 
at z^6, which can be interpreted as the mark of the tran- 
sition from a ne u tral Unive r se to a n ionized one (see e.g. 
ISongailal (|2QQ4D . iFan et al.l (j2QQ6[ )). From the observed 
spectra and provided that some assumptions are made 
on the structure of the density field or the UV back- 
ground, important quantities such as mean free path, 
photoionization ra te or UV f i eld in te nsity can be con - 
strained (see e.g. iFan et al.l (j2QQ2[) . IFan et al.l (|2006D . 
iBolton fc HaehneTtI ( 2QQ7I )). These constraints provide 
anchor values at z ~ 6 for the calibration of cosmolog- 
ical simulations of reionization and track their ability 
to simula te the post-ove rl ap er a and the overlap itself 
(see e.g. iGnedin fc FanI (|2QQ6f )). However, this tech- 
nique only provide upper/lower boundaries at higher red- 
shifts as complete absorption can be reached with a neu- 
tral fraction as low as 0.001. Furthermore, since models 
are used to infer physical properties from flux transmis- 
sion, any agreement or disagreement between calcula- 
tions and quantities derived from ob servations should b e 
taken with caution (as noted by e.g. iTrac fc CenI (|2007l )) 
and in a reversed role the simulations may happen to be 
informative about the proper way to interpret data. The 
second set of constraints comes from the scattering of 
CMB photons by electrons released during the reioniza- 
tion process. Usually expressed in terms of the Thom- 
son optical depth r, current constraints from WMAP set 
r = 0.084 ± 0.016 implying a reds hift of (in stantaneous) 
reionization of 2; ^ 10.9±1.4 Koma tsu et al . (2009). This 
constraint results from the integrated impact of the elec- 
trons on the CMB properties and is therefore more sen- 
sitive to the complete history of cosmic reionization. 

In this paper, our goal is to confront our new radiative 
transfer code ATON to these observational constraints, 
using a set of hydrodynamical simulations at different 
resolutions. This code has already been presented and 
tested using a standard test suite in lAubert fc Tevssierl 
C2OO8'). The dynamical simulations include gravity and 
gas physics with mesh refinement, as well as widely 
adopted and well tested star formation recipe. The ra- 
diative transfer is performed as a post-processing step 
(full coupling of hydrodynamics with radiation is cur- 
rently underway). It relies on a moment-based descrip- 
tion of the prop agation of light in the same spirit as e.g. 
IGnedin fc Abill (|200L) or iFinlator et all (|2QQ9al) . The 
original ATON code has since been fully ported on Graph- 
ics Processing Units (GPU hereafter) architecture using 
CUDA. Thanks to the high acceleration rate (~ 100 x 
compared to CPU) made possible by such hardware, we 
have been able to simulate the radiative transfer at the 
same resolution as the hydrodynamics base grid with 
1024^ cells. The current article aims at reaching two ob- 
jectives : first, showing the ability of ATON to model prop- 



erly the reionization process and second, demonstrate the 
potential of GPU architecture for numerical cosmology. 
Regarding the ability to model the reionization, we par- 
tially recover the observational constraints at z ~ 6 if 
we include a simple clumping factor model. However we 
also find that the properties of the radiation field and the 
neutral fraction distribution are driven by very different 
regions, making it difficult to relate the average UV in- 
tensity to the average fraction of neutral gas. Regarding 
the adaptation of our code on GPU, we describe in de- 
tails in the Appendix how such architecture can be used 
at full power on this type of problems. 

This paper is organized as follows: first we describe the 
methodology and the simulations. Second we describe a 
first set of fiducial simulations and assess in particular 
the issues related to resolution and numerical conver- 
gence. Third, we introduce a simple prescription for the 
subgrid clumping obtained from our most resolved simu- 
lation (12.5 Mpc/h with 1024^ dark matter particles) and 
apply it to the largest simulation we have (100 Mpc/h 
with 1024^ dark matter particles). Finally, we discuss 
our results, forthcoming applications and possible im- 
provements. 

2. METHODOLOGY 
2.1. Simulations 

The cosmological simulatio ns analyzed i n this work 
were produced using RAMSES (Teyssier 2002). The cos- 
mological parameters follow the WMAP-5 constraints 
(|Komatsu et al.l |2009[ ) and the initial conditio ns were 
gener ated using the MPGrafic package (Prunet et al.l 
I2OO8I ). We have generated 4 sets of Gaussian random 
fields with different box sizes, based however on the same 
Poisson shot noise, so that the same structure should 
form at the same location, although with different tim- 
ings. The number of cells and dark matter particles was 
set to 1024^ and we allow for 2 more levels of refine- 
ment, resulting in the mass and spatial resolution el- 
ements quoted in Table [TJ The grid was dynamically 
refined up to the maximum allowed resolution, using 
a quasi-Lagrangian strategy: when the dark matter or 
baryons mass in a cell reaches 8 times the initial mass 
resolution, it is split into 8 children cells. 

Gas dynamics is modeled using a second-order unsplit 
Godunov scheme (Teyssier "2002|; iTevs sier et al.l 120061 : 
Froman g et al. 2006) based on the HLLC Riemann solver 
(|Tbro ef al.lll994D . We assume a perfect gas Equation of 
State (EoS) with 7 = 5/8. Gas metallicity is advected as 
a passive scalar, and is self-consistently accounted for in 
the cooling function. Note that in the present work, no 
radiation background was considered for the cosmologi- 
cal simulation. As gas cools down and settles into cen- 
trifugally supported discs, we need to provide a realistic 
model for the interstellar medium (ISM). Since the ISM 
is inherently multiphase and highly turbulent, it is be- 
yond the scope of present-day cosmological simulations 
to try to simulate it self-consistently. It is customary 
to rely on subgrid models, providing an effective EOS 
that capture the basic turbulent and thermal properties 
of this gas. Models with various deg rees of complexity 
have been proposed in the literature (lYepes et al.l [19971 : 
"Springel fc HernQuistl [2QQllSchave fc Vecchiall2008D . We 
follow the simple approach based on a temperature floor 
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^iven by a poly tropic EOS for gas 



T floor — 



r-i 



(1) 



where = 0.1 H/cc is the density threshold that defines 
the star forming gas, = 10^ K is a typical temperature 
mimicking both thermal and turbulent motions in the 
ISM and F = 5/3 is the polytropic index controlling the 
stiffness of the EOS. Gas is able to heat above this floor, 
but cannot cool down below it. Note that because of this 
temperature floor, the Jeans length in our galactic discs 
is always resolved. We also consider star formation using 
a similar phenomenological approach. In each cell with 
gas density larger than n*, we spawn new star particles 
at a rate given by 



with tff = 



32Gp 



(2) 



where tff is the free-fall time of the gaseous component 
and = 0.01 is the star formation efficiency. The star 
particle mass depends on the resolution (see Table [1]). 
For each star particle, we assume that 10% of its mass 
will go supernova after 10 Myr. We consider a supernova 
energy of 10^^ erg and one Mq of ejected metals per 10 
M0 average progenitor mass. This supernovae feedback 
was implemented in the RAMSES code using the "delayed 
cooling" scheme (Stinson et al. 2006). 

To summarize, we used for this simulation suite 
rather standard galaxy formation recipe, which have 
proven only recently to be quite s uccessful in repro- 
ducing the properties o f field spirals (|Maver 
IGovernato et al.l I2009L I2010D and dwarf galaxies. The 
only missing ingredient is the radiation field, which will 
be considered in a second step using our radiation solver. 

2.2. Radiative Transfer 

Each snapshot of the simulations is post-processed us- 
ing the ATON code , described and tested in details in 
lAubert fc Tevssierl (j2008) and briefly summarized in this 
section. The method relies on a momentum description 
of the ra diative transfer equa tions with an Ml closure 
relation ([Gonzalez et al.ll2007[ ). Radiation is described 
in terms of the first three moments of the distribution 
function of photons, the radiative energy density and 
the radiative flux F and the radiative pressure tensor P. 
These quantities are averaged over a group of frequencies 
and satisfy the usual conservation relations: 



dN 

~dt 
OF 



VF = 



c2VP = 



-kN + S, 
-kF, 



(3) 
(4) 



where stands for the local absorption rate and 5'(x, t) 
is the source field which includes the production sites of 
photons as well as the recombination radiation. The Ed- 
dington tensor D close the system through an equation 
of state: 

P = D7V, (5) 

where D is approxim ated by the Ml model 
(|Dubroca FeugeaslfToOOD : 



The quantity x depends only on the reduced radiation 
flux / = iFl/cA/" and spans the values from 1/3 (pure dif- 
fusion regime) to 1 (pure transport regime) and depends 
only on the local properties of the radiation fields. The 
exact formula for x can be found in Aubert fc Tevs sierl 
(2008). Such a formulation guarantees that the two ex- 
treme regimes are properly captured, while all interme- 
diate situations are approximated by a superposition of 
diffusion and transport. It should also be noted that 
this scheme differs from the common first-order flux lim- 
iter approach by its ability to cast shadows behind ab- 
sorbants (see lAubert fc Tev ssier 2008). 

The previous radiation conservation laws are solved us- 
ing an explicit time integration, resulting in a stringent 
CFL conditions on the time step due to the high value 
of the speed of light : 



Ax 

— > At. 

c 



(7) 
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However, thanks to GPU acceleration, we can speed up 
each individual time step so that the resulting scheme can 
still achieve high performance. The details of the GPUs 
implementation are given in the Appendix. Originally 
the code is able to evaluate intercell fluxes using both the 
Haardt-Lax-van Leer (HLL) or the simpler Lax-Friedrich 
(LF) scheme but only the latter has been used in the 
current work. 

The photo-chemistry in ATON is currently limited to 
Hydrogen with the associated cooling processes. Again, 
the energy conservation and the chemistry are solved in 
an explicit fashion and are sub-cycled dur ing a radiative 
transfer step using a scheme in the spirit of lAnninos et al.l 
(1^97). It turns out that most of the time the charac- 
teristic time scales involved in cooling and chemistry are 
longer than radiative time steps, thus limiting the impact 
of 'microphysics' calculations on the overall computation. 

All the processes (transport, cooling, and chemistry) 
and their equations are solved in a single frequency 
group where u > 13.6 eV and involve average quantities 
such as the hydrogen photoionization cross-sections de = 
2.49 • lO-^^cm^ (energy averaged), an = 2.93 • lO-^^cm^ 
(number averaged) and the typical ionizing photon en- 
ergy e = 20.27 eV, where a 50 OOOK black body spectrum 
is assumed. 

Typically, one complete radiative transfer simulation 
requires between 30 000 and 240 000 time steps depend- 
ing on the resolution which defines the time step and 
the starting redshift. The 800 Myr of cosmic evolution 
we would like to cover is described with a time reso- 
lution of 3500 years. The code has been deployed on 
GPU architecture using the API CUD A 2.2, (becoming 
thus CUDATON) developed for devices built by the Nvidia 
company. The code runs independently from the CPU, 
without any transfer between the host and the device 
except during the initial setup and for the outputs on 
hard drives. The typical acceleration observed compared 
to single-cpu runs is close to a 100. Using an additional 
MPI layer, CUDATON is able to run on multi-gpu architec- 
ture with communications between the devices, which re- 
quires additional transfer between hosts and GPUs. The 
additional cost is close to 10 percent of the total comput- 
ing time since data have to be transfered through PCI- 
Express ports. All the calculations here were performed 
on 128 Tesla C1060 devices on the Titane supercomputer 
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of the CCRT computing center. Typically a single radia- 
tive post-processing run on a 1024^ grid is performed in 
2.5 hours but can be as short a 1 hour for coarse simu- 
lations with simple physics and as long as 18 hours for 
our most realistic calculations. During the course of this 
project, a couple hundred of calculations over six months 
were performed to improve the code and to test our var- 
ious recipes. 

2.3. Source modeling for the radiative post-processing 

The sources of photons, namely young stars, are pro- 
duced by the cosmological simulations that return for 
each stellar particle its position, velocity, age, mass, and 
metallicity. From there, the source modeling is inspired 
from the procedure described in Back et al. (2009). Stel- 
lar particles are assumed to satisfy a Salpeter IMF result- 
ing in a global spectra well approximated by a 50 000 K 
black body. Individual lifetimes of stellar particles as 
ionizing sources are drawn randomly between 5 and 20 
Myrs. Overall, for each source, the production of ioniz- 
ing photons lies between 24 000 and 98 000 per stellar 
baryon over its lifetime. Because the sources appear at 
discrete times, due to the discontinuous production of 
snapshots, the sources contribution is smoothed out over 
all the duration between 2 successive snapshots using the 
following strategy. When modeling the radiative trans- 
fer from time tp to tp+i, we consider only star particles 
contained in snapshot p + 1. Knowing their age a^+i, we 
calculate their age = a^+i — (t^+i — tp) at time step 
p, which can be negative if the star appeared at a time 
a* between the two snapshots. Then : 

• if ap is greater than the source's lifetime L, it is 
discarded, 

• if ttp < 0, the source has been created between the 
two snapshots. However, it will contribute to the 
photon emission from tp to tp+i with a 'diluted 
photon' emission rate given by (a^+i — a*)/(tp+i — 
tp)N, 

• if < ttp < L the source will have an emission rate 

given by min(l, L — ap)/(tpj^i —tp)N. If the source 
ends its ionizing phase between the snapshots, it 
will nevertheless contribute continuously from tp 
to tpj^i with a 'diluted' emission rate. 

From there, sources are projected on the 3D grid using 
the Nearest Grid Point assignment scheme. Emission is 
modeled as a field where each cell acts as a single photon 
source. 

These intrinsic luminosities are modulated by two ad- 
ditional factors to give the effective source luminosity: 



N^E = NX /esc X B. 



(8) 



The first factor is the escape fraction /esc which mod- 
els the actual fraction of radiative energy which manage 
to escape the stellar environment. Typical values can 
be as high as 20% and is essentially a free parameter 
which allows to tune the reionization redshift. The sec- 
ond factor, 5, is called here the boost factor. It is a 
correction term that compensates from the unresolved 
star forming halos in the simulation, a major resolu- 
tion effect on the simulated star formation history. As 



shown in e.g. iRasera fc Teyssierl (|2006f ). the mass reso- 
lution has a significant impact on the star formation his- 
tory (SFH) if large simulation volumes are considered, 
when the minimum resolved halo mass (optimistically 
set to 100 dark matter particles) is larger than the mini- 
mum mass for s tar form i ng ha l os, based on atomic cool- 
ing arguments (jGnedinl 120001 : iRasera fc Tevssierl 120061 : 
iHoeft et al.l [20061 ). This minimum mass (also referred 
to as the Filtering Mass) starts around 10^ before 
reionization and then rises ste adily as (1 -\- z)^l^ from 
redshift 6 to the final epoch (jRasera fc Tevssierl 120061 : 
iHoeft et"ani2006f ). Resolving this minimum mass before 
reionization will require a dark matter particle mass be- 
low 10^ M0, a rather strong requirement for cosmological 
simulation. Only our smallest box size (12.5 Mpc/h with 
1024^ particles) barely satisfies this criterion. 

As an illustration, the top panel of Figure [T] shows the 
evolution of the integrated number of photons with time 
in 4 simulations at different resolutions, which depends 
directly on the simulated SFR. Clearly the difference in 
resolution has an impact on the apparition of the first 
sources : low resolution simulations require a longer time 
to reach the epoch of the formation of the first stars : 
z ~ 11 for the 100 Mpc/h simulation versus 2: ~ 18 
for the 12.5 Mpc/h simulation. Furthermore, this late 
start is not compensated by a higher SFR and at z = 6, 
the number of emitted photons decreases as lower spatial 
resolution are considered. 

We used the analytical model of IRasera fc Tevssierl 
('2006') to compute the expected converged star formation 
history. We can compensate for the unresolved star form- 
ing halos by boosting each resolved UV emitting sources 
by a "boost factor", derived to put the actual simulated 
SFRs (and hence the number of emitted photons) in ac- 
cordance with the converged one. We have used for the 
boost factor the following simple functional form: 



B(t) — min(l, a5 exp(/c5/t)) 



SFRconverged (^) 



SFR 



■actual 



(9) 



where t is the age of the Universe. The parameters a5 
and kh are fitted in the measured SFH in each simula- 
tion. They hence depend on the resolution and are given 
in Table [1] for 1024^ + 2 levels of refinement simulations 
with the WMAP-5 cosmology. The resulting integrated 
photon numbers is shown in the bottom panel of Figure 
[1] and exhibit a good level of convergence at redshifts 
2: < 9. Let us emphasize that the two parameters, /esc 
and B are different by nature : B is not a free parameter 
and follow from the proper analysis of resolution effect 
and is in some sense a pure numerical correction. Mean- 
while, /esc remains as a physical parameter which models 
e.g. the subgrid physics and in the end, serves mostly to 
set the redshift of reionization. Of course, this simple 
prescription does not fix the late apparition of stars at 
low resolution since it only corrects existing stars without 
creating new sites of stellar formation. In our investiga- 
tions, it appears that low resolution simulations (with the 
largest correction) exhibit similar behaviors than highly 
resolved ones but we admittedly focus on average quan- 
tities and global distributions. Fine geometrical details, 
on the other hand, are likely to be poorly captured by 
this boost factor approach, because of the lack of small 
emitters in unresolved site of stellar formation. 
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Fig. 1. — Integrated number of emitted photons in units of 
the number of hydrogen atoms as a function of redshift in four 
(1024^+2 levels of refinement) hydrodynamical simulations with 
different coarse resolutions. The top panel shows the original emis- 
sivity due to the simulated stellar population, while the bottom 
panel shows the boosted emissivity which compensates the impact 
of resolution on the simulated SFR. All plots are performed with 
/esc = 0.03. 

3. RESULTS 

Several aspects were investigated during this work, 
starting from basic numerical experiments focusing 
mostly on the resolution effects to slightly more complex 
modelisation where we attempt to fit observational data. 
First, we describe our fiducial results obtained from the 
post-processing of AMR simulations. From there, we dis- 
cuss the impact of sub-cell clustering to the modelisation 
with a focus on the quantity of absorbants at z ~ 6. 

3.1. Fiducial experiments 
3.1.1. Global properties 

The fiducial experiments consist in four simulations 
described in Table [TJ with comoving box size ranging 
from 100 Mpc/h to 12.5 Mpc/h. The dynamical simu- 
lations were performed on a 1024^ coarse grid + 2 level 
of refinement at z ~ 6 while the radiative transfer post- 
processing was computed on a 1024^ regular grid. Highly 
resolved simulations are expected to better resolve the 
small scales photon sinks but lack the strong and rare 
sources that populate large scale volumes. Conversely, 
large simulation have a better representativity of rare 
and strong events but lack the resolution on absorbants. 
This features are somehow reflected in the values of the 
escape fraction shown in Table [1] : for a redshift of reion- 
ization chosen to be zion = 6.5, /esc decreases with the 
box size from 0.055 to 0.02. Highly resolved simulations 
have sources embedded in highly clustered gas, implying 
a more efficient recombination, and these sources cannot 
be as strong as the ones found in large volumes. Overall, 
such simulations require a larger amount of photons to 
reionize. 

Maps of the distribution of neutral gas are shown in 
Fig. [2] at half reionization. Let us recall that these 
four simulations were performed with initial conditions 
that shared the same set of phases leading to similari- 
ties in the global spatial distributions. These maps ex- 
hibit the expected global behaviors: high resolution sim- 
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Fig. 2. — Neutral hydrogen density maps at z ~ 7.3 and x ~ 0.5 
(volume weighted) for boxes of comoving length 100, 50, 25, 12.5 
Mpc/h. All maps have a resolution of 1024^ and a thickness of 5 
Mpc/h. The color scale is logarithmic with blue and red regions 
standing respectively for low and high density of neutral hydrogen. 
Coordinates are expressed in comoving Mpc/h. 
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Fig. 3. — Same as Figure[2]but zooming on a photoionized region. 
Coordinates are expressed in comoving Mpc/h. 

ulations present complex ionization fronts, which result 
from the highly inhomogeneous structure of the absorb- 
ing regions. Meanwhile, low resolution simulations fail to 
resolve small scale structures leading to smoother fronts. 
Looking at the details of zoomed maps (see the bottom 
plots in Fig 12]), highly resolved simulations present dense 
neutral clumps within ionized regions whereas these ab- 
sorbants are absent from large under-resolved boxes. The 
failure of large simulations boxes to resolve these small 
scales will prove to be crucial in our ability to reproduce 
the data at z ^ 6. 

3.1.2. Neutral fraction 

The evolution of the volume- weighted ionized fraction 
Xy and neutral fraction 1 — Xy is shown in Fig. HI Escape 
fractions were chosen to achieve reionization at z ~ 6.5 
and it can be seen that all four calculations present sim- 
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Box size 


db 


h 


/esc 


mdm 


^bar 


■^star 


Mpc/h 




Myr 




Mo 


Mo 


Mo 


12.5 


0.7 


300 


0.055 


1.52x10^ 


2.54x10^ 


5.81 xlO'* 


25 


1.0 


650 


0.030 


1.22x10^ 


2.03x10^ 


4.65 xlO^ 


50 


1.2 


1500 


0.020 


9.76x10^ 


1.62x10^ 


3.72 xlO^ 


100 


1.2 


3000 


0.020 


7.81x10'^ 


1.30x10'^ 


2.97x10'^ 



TABLE 1 

Summary of the parameters used in our simulation suite. Parameters and are used in the SFR correction to account 
for finite resolution, assuming wmap-5 cosmology, /esc is the assumed escape fraction, mdm is the mass resolution of 

DARK MATTER PARTICLES WHILE mbar IS THE MASS RESOLUTION PER AMR GRID CELL. AlSO SHOWN IS THE MINIMUM STAR PARTICLE MASS. 

All simulations were performed with 1024^ dark matter particles. 



ilar behavior for z < 9. Distribution of values are shown 
as colored contours in Fig. [5] and it can be seen that Xy 
is representative of the distribution of ionized fraction 
in the boxes as it tracks accurately the most probable 
value. For earlier times, notable differences arise from 
the impact of resolution on star formation and the pro- 
duction of photons. Highly resolved simulations have 
ionization history that expand up to z = 18 where their 
first stars form. Conversely the largest simulation forms 
stars only at z = 11. Because of the introduction of a 
time-dependent boost of their sources luminosity, these 
large simulations quickly catch up the highly resolved 
one, resulting in a strong slope for Xy. The 'catching 
up' effect is clearly noticeable in the 100 Mpc/h calcula- 
tion but already much limited for the 50 Mpc/h simula- 
tion, almost unnoticeable for the 25 Mpc/h box and for 
z > 9 the calculations have all converged. This should 
not come as a surprise since the boost factor approach 
was designed precisely for that reason. Nevertheless, the 
different clustering of gas, of sources as well as their 
number could have resulted in a difference in the ion- 
ization history of the different calculations even though 
they share the same global amount of photons emitted. 
Since we do not observe such a discrepancy, it suggests 
that small photon sources missing from the large boxes 
are located roughly at the location of the large photon 
sources present in these boxes: by boosting their lumi- 
nosity, we compensate at the sub-grid level for the lack 
of stellar particles at the correct location. 

Let us also point out that the neutral fraction calcu- 
lated at z = 6 spans from 3 x 10~^ for the 100 Mpc/h 
simulation to 10~^ for the 12.5 Mpc/h. Such levels of 
neutral fr action ar e inco nsistent with constraints pro- 
vided by IFan et al.l (|2006f ) from Gunn-Peterson troughs 
in quasars spectra which imply a typical level of 10~^ 
at z=6. Even though this estimation relies on assump- 
tion on the distribution of gas and on an homogeneous 
field of radiation, this level o f neut ral gas has been repro- 
duced by e . g. iTrac fc CenI pOOl . iKohler et all (|20Q7i) . 
Shin et all (|2008[) an d on highly resolved simulations by 



Gnedin fc FanI (|2006[ ). On the other hand. lFinlator et al.l 
(|2009b[ ) present the same level of discrepancy at admit- 
tedly much lower resolution. We investigate this point 
further on in subsequent sections, but at the current 
stage, all our simulations fail to reproduce the observed 
neutral fraction without additional modelisation. 

3.1.3. Radiation field- UV intensity 

The moment-based description of radiative transfer al- 
lows us to track the radiation intensity in each cell, usu- 
ally described in terms of J21 or in the terms of photoion- 
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Fig. 4. — Neutral and ionised volume- averaged fraction as a func- 
tion of redshift measured in the 100, 50, 25 and 12.5 Mpc/h boxes. 
Dots stand for observationnal constraints given by Fan et al.l 
((2006h . 
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Fig. 5. — The evolution of the hydrogen neutral fraction jui in 
the 100, 50, 25 and 12.5 Mpc/h boxes (jrom top to bottom). The 
blue lines stand for the evolution of the volume-weighted average 
value and the color levels show the evolution of the fni distribution 
with redshift. Values with a high probability are shown in red and 
values with a low probability are shown in blue. 



ization rate ri2. The evolution of the volume averaged 
intensity is s hown in Figure [6] as well a s the constraint 
provided by iBolton fc Haehneltl ()2QQ7l ). For our four 
boxes, the evolution of the radiation intensity exhibits 
a 'cobra-like' shape with a sharp increase until the reion- 
ization epoch, during which the increase is the steepest, 
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Fig. 6. — Evolution of the mean intensity of ionizing radiation in 
the 100, 50, 25 and 12.5 Mpc/h boxes. The upper hmit at z ~ 6 
stands for the constraint given bv iBolton Haehneltl (l2007|). 

followed immediately after by a flattening of the slope 
at the end of reionization. The amount of radiation is 
larger by a factor of 3 at z=6 when comparing the 100 
Mpc/h and the 12.5 Mpc/h box, while the 25 Mpc/h and 
12.5 Mpc/h calculations seem to have converged. This 
trend is consistent with the differences in the residual 
neutral fraction calculated at z=6 (see Figure H]), where 
the highly resolved calculations present a larger amount 
of neutral gas than the poorly resolved ones. A stronger 
radiation field imply a larger photoionization rate and 
therefore a smaller amount of neutrals. 

Unfortunately, these calculations all agree on one 
point: they are inconsistent with observational con- 
strain ts, such as the one provided by Bolton & Haehneli 
(|2007[ ) by a factor of 20 to 50. Again, such a dis- 
crepancy has recently been found at lower resolution by 
iFinlator et all (j2009b[ ) using an alternative implementa- 
tion of moment based radiative transfer. This excess of 
radiation goes along with the lack of neutral gas at the 
end of reionization in our computations. Furthermore, 
an inspection of Figure [7] reveals that the average inten- 
sity is representative of the most probable value in the 
boxes, discarding any possibility of a biased value. 

3.1.4. Optical Depth 

The Thomson scattering of CMB photons by the 
electrons released during the reionization is quantified 
through the Thomson Optical depth given by 

dt 

r = c(Jt J ne{z)—dz, (10) 

where at is the corresponding cross-section and = 
xriH is the density of electrons released by ionized hy- 
drogen atoms. Our calculations of the optical depth is 
presented in Figure [8] for the four simulations. Also pre- 
sented is the constraints range obtained from the 5 years 
release of C MB measurements m ade by the WMAP col- 
laboration ()Komatsu et al.ll2009f ) at the la level. 

The four fiducial experiments were performed at differ- 
ent resolution and therefore exhibit different ionization 
histories but have converged in terms of optical depth. 
This agreement result from the fact that the bulk of 




redshift Z redshift Z 

Fig. 7. — Evolution of the mean intensity J21 in the 100, 50, 25 
and 12.5 Mpc/h boxes (from top to bottom). The blue lines stand 
for the evolution of the volume-weighted average value and the 
color levels show the evolution of the J21 distribution with redshift. 
Values with a high probability are shown in red and values with a 
low probability are shown in blue. 

electrons production lies within the convergence redshift 
range (z<ll). The four of them present an optical depth 
r = 0.051 which lies at 2a from the CMB value. The 
inclusion of helium electrons would sligh tly increase the 
amount of electrons (see e.g. IFinlator et al.., 2009b) but 
probably not at levels that would make it consistent with 
the WMAP expected value. This discrep ancy has al- 
ready bee n noted bylGnedin fc FanI (|2006[ ) , iTrac fc Cen] 
(|2007[ ) or IFinlator et all (j2009b[ ) for simulations with 
similar ionization redshifts. Also shown in Figure [8] is the 
optical depth measured in the largest box (100 Mpc/h), 
but with larger and larger escape fractions. These cal- 
culations were performed at the same level of resolution 
than the fiducial experiments and under the same proto- 
col. Clearly, the resulting r gets closer to the CMB value 
as a consequence of a larger density of photons at earlier 
times. It indicates that a larger escape fraction could 
be the solution toward an agreement, however it comes 
at the cost of a larger redshift of reionization as shown 
in Figure [9] which would place the z=6 neutral fraction 
even further to the observed constraints than the fiducial 
ones. It is therefore likely that a plausible path toward 
an agreement between the observed and the calculated r 
lies in a varying escape fraction which would ensure an 
extended ionization history to increase r while keeping 
the reionization redshift reas o nably low. For instance in- 
vestigations by IWise fc CenI (|2009) suggest that higher 
escap e fractions could exist in small galaxies. Further- 
more llliev et all (|2007h argue that early populations in 
small haloes at 2; ^ 22 are important contributors to 
the optical depth, and these objects are missing in our 
calculations. Other routes toward agreement may lie in 
additional physics such as the inclu sion of specific pop- 
ulation III sources (Trac fc Cen 2007). Finally it should 
be noted that Shull fc Venkatesan (2008) suggest that in- 
accuracies in the reionization history and degeneracies in 
cosmological parameters lead to a larger range of possible 
values of r, 0.06-0.11 at la : the discrepancy between our 
calculation and the CMB constraint could be resolved in 
between at r ~ 0.07. 
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100 Mpc/h fesc=0.008 
100 Mpc/h fesc=0.02 
100 Mpc/h fesc=0.055 
100 Mpc/h fesc=0.08 
50 Mpc/h 
25 Mpc/h 
12.5 Mpc/h 




Fig. 8. — The Thomson optical depth computed from the av- 
erage mass weighted electron density. The gray area stands 
for the WMAP-5 measurements allowed range at la level (see 
iKomatsu et al.. (,2009. )). The 50, 25 and 12.5 Mpc/h curves are 
almost superimposed. 
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Fig. 9. — Evolution of the volume weighted neutral and ionized 
hydrogen fraction in comoving 100 Mpc/h-10243 boxes with differ- 
ent escape fractions. 

3.1.5. Density dependance of UV intensity and neutral 

fraction 

In order to investigate our results on the average neu- 
tral fraction and UV intensity, we have computed the 
distribution {l—x){nH) and J2i{tih) just after the reion- 
ization at z=6.25. The results are shown in Figures [10] 
and [TTl From Figure [TOl we can see that the radiation 
field is not strictly homogeneous: although it is quasi- 
constant for densities tih < 0.001 cm~^, we see a signif- 
icant decrease of the flux in the densest regions. In par- 
ticular, an accumulation of obscured regions is apparent 



at densities close to 5 x 10~^ cm~^ with a radiation field 
1000 times weaker than the volume average. This feature 
is more proeminent in highly resolved simulations (12.5 
and 25 Mpc/h) and corresponds to a better treatment of 
dense, small absorbants, which we fail to model properly 
in large boxes. The strong cutoff of radiation in high- 
density regions is a manifestation of self-shielding, where 
dense clumps are protected from the ionizing background 
by their own high densities. 

We model this high-density behavior using two differ- 
ent models with different levels of exponential cutoff : 



and 



Jiinn) = Joex.p{-nH/nl) 



MriH) = Joex.p{{-nH/nl)^), 



(11) 



(12) 



where Jo stands for the average intensity field at low 
density and 4 is the characteristic density at which the 
exponential cutoff operates. For sake of simplicity, we ar- 
bitrarily assigned the same values for the characteristic 
self-shielding densities for the four simulations, namely 
nl = 0.007 cm~^ and = 0.018 cm~^, which repro- 
duce accurately the global J2i{nH) behaviors in the three 
largest boxes and is slightly off for the 12.5 Mpc/h calcu- 
lation. On the other hand, the plateau value is computed 
separately for the 4 box sizes. Clearly, the J4 model is 
a better representation of the density dependence but 
the other model Ji has also been considered for sake of 
comparison. It should be noted that the volume- average 
value (J2i)(^i/) per density bin is also shown on Figure 
[Tm as the white solid line. Surprisingly, it increases to 
levels up to 100 times the volume average value, instead 
of following the exponential cutoff. While the latter is 
a good model for the most probable radiation flux, it 
does not predict the correct volume- average value. The 
discrepancy starts at densities close to 10~^ cm~^ and 
is likely to be due to strong sources of radiation which 
are found inside galaxies. This illustrates the fact that 
the radiative intensity may be subject to strong biasing 
effects, especially at high densities, and its average value 
must be therefore taken with caution. 

These models for the UV radiation field can be used to 
compute the expected neutral fraction, assuming pho- 
toionization equilibrium. The result of such a proce- 
dure is shown in Figure [11] for the four fiducial sim- 
ulations with the equilibrium (1 — x){nH) curves com- 
puted for a uniform radiation field equal to Jq and for 
the small/strong cutoffs models Ji and J4. Also shown 
is the average neutral fraction per density bin. Clearly 
the average neutral fraction follows the equilibrium trend 
at low densities {uh < 0.001 cm~^) which is well re- 
produced by the three models. For higher densities 
{riH > 0.01 cm~^), the average neutral fraction rises 
sharply and its distribution presents a significant tail to- 
ward neutral gas, even though the spread remains quite 
important : for instance at tih ^ 2 x 10 ~^ neutral frac- 
tions from 10~^ to 1 can be found with almost equal 
probabilities. This tail cannot be reproduced by the uni- 
form UV field model Jo which is not surprising since Jo 
is not representative of the radiation field in which self- 
shielded high density regions lie. The small cutoff model 
Ji does a better job at reproducing the tail but under- 
estimates the strength of rise toward larger neutral frac- 
tions. The strong cutoff model J4 is in better agreement 
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which is also expected since it models more accurately 
the typical trend of the UV field as a function of den- 
sity (see Figure [To]). This overall agreement between the 
computed radiation field and neutral fractions indicates 
that the global neutral fraction can be recovered assum- 
ing photoionization equilibrium, as long as the correct 
model for self-shielded UV radiation is use^. 

3.2. Subgrid clumping model 

Our fiducial numerical experiments, albeit highly re- 
solved in terms of radiative transfer, lack some resolution 
for the underlying gas distribution. From Table [T] the 
largest simulation fail to resolve star forming haloes at 
z > 11 and ~ 10^ Mq mini-haloes which are expected to 
act as a sink of photons during the reionization process. 
On the other end of our sample of simulation, the small- 
est boxes reasonably resolve these scales but are too small 
to e.g. provide a correct description of the cosmological 
HII regions whi ch are expected to be as large as tens 
ofMpc (see e.g. iBarkana fc Loebll2QQll : iMcQuinn et al.l 
I2QQ7I ). From now on, we focus on the largest simula- 
tion (100 Mpc/h with 1024^ dark matter particles) but 
with an additional subgrid model, in order to combine 
large scale statistics with a corrected small-scale physical 
model. From now on, we only compare our calculations 
to the constraints at z = 6 from quasars spectra, and 
put Thomson optical depth measurements aside. Since 
this quantity is more sensitive to the global reionization 
history, the fact that our star formation history starts at 
z = 11 in the largest box cannot be compensated by any 
other means than just increasing the mass resolution or 
drastically changing the star formation recipe. Exploring 
these possibilities is postponed to future work. 

3.2.1. Clumping factors 

When considering the hydrogen chemical balance equa- 
tion, one gets: 

= aneUHii - TriHi (13) 

which is modified in the following manner if one consider 
the ionization fraction x : 

- = anjjx'^ - (1 - x)n//r, (14) 

where tih stands for the hydrogen number density (neu- 
tral+ionized), a and F are respectively the recombi- 
nation and photoionization coefficients, and x is the 
usual ionized fraction. As we deal with fields defined 
on a grid, we only have access to quantities averaged 
within the cells such as {uh) which lack some infor- 
mation on the subgrid variations. Defining a recombi- 
nation clumping factor as Cr = {n'jjx'^) / (nnx)'^ and 
a photon-atomic density cross clumping factor Cj = 
(n^(l — x)nh) / (n^) {{1 — x)nH)^ the chemistry equation 
can be rewritten as: 

^ = -{a){nHfx^CR - (1 - x)ca{nH){n^)Ci. (15) 

The choice of definition fo r clum ping factor is not unique, 
for instance iKohler et al.l (|2007[ ) define clumping factors 

^ Incidentally, it also shows that some consistency is achieved 
within our code 



where the averages are take over the whole terms such as 
(a(T)n|^x^), which depends on density, ionization frac- 
tion and also temperature. Our choice of clumping fac- 
tors basically assumes that temperature is distributed 
uniformly within the computational cells. 

We compute the clumping factors for the 100 Mpc/h 
simulation using the 12.5 Mpc/h simulation. For 6 < 
z < 18, the total Jeans Mass in the case of an adi- 
abatically cooling gas decreases from 1.5 x 10^ Mq to 
4 X 10^ Mq while the baryonic filtering mass evolves 
from 10^M (T) to 5 x IO^M^t) d uring the same inter- 
vall (see e.g. IGnedin fc Huilll998l : [Barkana fc Loebl 120011 : 
iMcOuinn et al.ll2007[ ). From Table [H one can see that 
our mostly resolved simulation almost achieves this level 
of resolution. Our model should therefore provide a rea- 
sonable description of the density distribution at small 
scale. 

All the 8x8x8 cubes in the 12.5 Mpc/h simulation are 
considered in the present analysis. Clumping factors are 
computed by averaging the relevant quantities on these 
512 cells. This 8^ cells volume in the 12.5 Mpc/h cor- 
responds to the volume of one cell in the 100 Mpc/h. 
The distributions of Cr and Cj as a function of the den- 
sity riH are shown in Figure [12] and [13] for six bins of 
ionization levels. This distributions were performed by 
averaging 6 snapshots between 13.1 < z < 5.9, yielding 
clumping factors which do not depend on redshift but 
only on the physical properties of the cell. This choice 
aims at simplicity and can suffer from two caveats, one 
statistical, the other more physical. First the distribu- 
tion can be skewed by a snapshot which dominates the 
other. Second, by summing the contributions at all red- 
shift we basically ignore a possible redshift dependance 
of the clumping and somehow ignore the ionization his- 
tory of a cell. Still, we checked that the models described 
below still holds on a redshift by redshift basis. 

Clearly, the Figures [12] and [T3] present an important 
spread of values around the mean trend (shown in white). 
It should not come as a surprise since we somehow pro- 
jected the clumping factors on the x, tih space, putting 
aside e.g. the temperature or the local ionization field. 
Still some trends can be fitted to a good level of approxi- 
mation. Considering the distributions of Cr first the dis- 
tribution are reasonably fitted by Cr ~ n^'' trends, es- 
pecially considering ionized fractions greater than 0.005. 
The fit is poorer for smaller x but such level of neutral 
gas do not contribute much to recombination. The same 
normalization can be applied for 0.005 < x < 0.2 (see 
the green dotted curves) but a smaller normalization (red 
dashed line) seems to be necessary to fit the mean trend 
for the last class of ionization . Such results ar e consis- 
tent w ith the 'B' clumping factors found by Kohl er et al.l 
(2007). Interestingly, we also recover their 'A' clumping 
factors which are biased toward high densities and which 
follow a Cr ~ n'jf power law (shown as dashed dark 
lines) : a subsample of cells follow this trend for high 
densities like the outliers in the 0.04 < x < 0.2 panel or 
the high density rise of the distribution in the x > 0.2 
panel. By looking at distribution for single redshifts (not 
shown here), Cr coefficients with the same 'A' trend can 
be found for all the ionized fractions x > 0.0005 at the 
highest densities. But their overall weight is such that 
volume averaged trends tend to follow a 'B' law with a 
gentler slope (shown in white), 0.7 instead of 2.5. 
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Fig. 10. — The density contrast vs ionizing intensity (as J21) relations measured in the 100, 50, 25 and 12.5 Mpc/h boxes at 2; ~ 6.25, 
which corresponds to a fully ionized simulation. Red (resp. blue) regions stand for high (resp. low) probabilities in the distributions. The 
white line stand for the average ionizing intensity per density bin {J2i)(nH). The dashed red line stands for the volume average J21 = Jo 
value, the dashed black line stand for the J21 = Jo exp(— ni:f /n^) model and the dotted line stand for the J21 = Jo exp(— (riiiz/nj)^) model. 



From there it is clear that a single normalization of 
the Cr ~ n^jf^ law cannot be representative of all the 
ionization fraction classes. Therefore we choose to per- 
form two sets of runs, one with a 'high normalization' 
(shown in green in Figure [T2)) and one with a 'low nor- 
malization' (shown in red). The former is adequate for 
X < 0.2 but overestimates clumping in ionized regions 
while the latter underestimates clumping in regions with 
high neutral fraction but is a better fit of the clumping 
in ionized cells. Again, this is consistent with the result 
found by iKohler et al.l (|2QQ7[ ) who studied the neutral 
fraction dependence of the clumping factors. Further- 
more and as shown hereafter, no strong differences can 
be noted between these two calculations suggesting that 
a more detailed Cr with an x dependance (which should 
lie in between) would lead to similar results. 

Considering next the photoionization clumping C/, 
which distributions are shown in Figure [131 it clearly 
appear that the clumping is less pronounced than for re- 
combination. The mean trend can be fitted by Cr ~ 
laws (shown as red and green lines in the panels), which 
we used in the subsequent experiments. It should be 
noted that we recover again a (7/ ~ n'j^^ for high density 
outliers but they are not representative of the overall dis- 
tributions of Ci values . In the end, it appears that the 
photoionization clumping factors are much smaller than 
recombination one and incidentally during this work we 
performed calculations with Cj = 1 (and Cr ~ n'jj^)- 
It is equivalent to assuming an homogeneous UV back- 
ground : from the Figure [TOl it is quite clear that J21 does 
not depend strongly on the density for a large range of 



values and the approximation made by choosing Cj = 1 
should be reasonably close to the actual clustering. 

3.2.2. Results 

The simple clumping models described in the previ- 
ous section were added to the basic version of the chem- 
istry/coohng module, and radiative transfer has been 
performed again for the 100 Mpc/h box, with the same 
boost factor as the one given in Table [H The simple vi- 
sual inspection of the fields is quite informative on the 
impact of our subgrid clumping model on transfer cal- 
culations : Figure [M] presents the same slice within the 
box, at the same instant during the pre-overlap phase but 
for calculations with or without subgrid model. First 
the overall geometry can be recognized in both calcu- 
lations, however the experiment with subgrid clumping 
model presents less extended ionized regions indicating 
that the overall chronometry has been modified : with 
clumping the radiation field is less efficient in ionizing 
the gas and requires therefore a longer time to achieve 
a certain level of ionization. Moreover, the experiment 
without clumping present ionization front which appear 
smoother than they are in the subgrid clumping model, 
reflecting again the increased difficulty for radiation to 
pass through higher density regions. Finally, if one looks 
closer at photo- ionized regions, much more pockets of 
neutral gas are seen in the clumping model, as a conse- 
quence of the larger recombination rate. 

To assess more quantitatively these aspects, we present 
in Figure [15] the same distributions as in section 13.1.51 
namely x{nH) and J2i{'^h) but within our clumping 
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Fig. 11. — The density contrast vs neutral fraction relations measured in the 100, 50, 25 and 12.5 Mpc/h boxes at z=6.25, which 
corresponds to a fully ionized simulation. Red (resp. blue) regions stand for high (resp. low) probabilities in the distributions. The white 
line stands for the average neutral fraction per density bin (1 — x)(nH)- The red line stand for the expected neutral fraction assuming 
equilibrium and an ionizing intensity equal to Jq. The dashed and dotted line stand respectively for the expected neutral fraction assuming 
equilibrium for the J21 = Jq exp(— ni://n^) and J21 = Jq exp(— (niij/nj)^) models. 
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Fig. 12. — Clumping factors Cr as a function of the density 
computed from 8x8x8 cells in the 12.5 Mpc/h simulation and 
used in the 100 Mpc/h experiment. The six panels show the Cr 
distributions at different ionization levels : the solid line shows 
the {CR){nH) trend while the green dotted and the red dashed 
line stand for Cr ~ models with respectively a high and 

low normalization. The black dashed line stand for models with 
Cr ~ at high densities. 

factor model. These distributions are given at a post- 
overlap redshift {z = 5.92). Like previously, the volume- 
averaged radiation field follows closely the fiux in low 
density regions {nn < 5 x 10~^ cm~^), where its in- 



FlG. 13. — Clumping factors Cj as a function of the density com- 
puted from 8x8x8 cells in the 12.5 Mpc/h simulation and used 
in the 100 Mpc/h experiment. The six panels show the Cj dis- 
tributions at different ionization levels : the solid line shows the 
{Cj)(nH) trend while the green dotted and the red dashed line 
stand for Cj ~ models with respectively a high and low nor- 
malization. The black dashed line stand for models with Cj ~ 
at high densities. 

tensity is quasi-constant. From this density upward, a 
strong exponential cutoff is observed, with a radiation 
field three orders of magnitude smaller than the average 
value. Clearly, high density regions live in a radiation 
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Fig. 14. — Neutral fraction maps (red zones are neutral, blue 
ones ionized) in a slice of thickness equals to 9.7 kpc/h comoving 
at z=6.97. Distances are given in pixels and each map side covers 
100 Mpc/h comoving (left column) and 19.53 Mpc/h comoving 
(right column). The top row pictures stand for the calculations 
with subgrid clumping, the bottom one for the same calculation 
without it. 

field different tiian tiie rest of tiie simulated volume. We 
use the same type of models Jo, Ji and J4 as previously 
with nl = 0.006 cm~^ for both clumping models and 
n| = 0.016 - 0.025 cm~^ for resp. the high and low 
normalisation models. We recompute the equilibrium 
ionized fraction and compare it to the distribution actu- 
ally found in the numerical experiment. The calculations 
are performed assuming the same clumping models as 
the one used during the simulation and shown in Figure 
[TBI When compared to the calculation without subgrid 
clumping, it can be noted that the fraction of neutral is 
more important and that gas tend to be more neutral 
at a given density. The "cobra rise" of the average J21 
as a function of z is steeper with clumping and satu- 
rates at lower density than it used to. At density close 
to 5 X 10~^ cm~^, the distribution of neutral fraction 
is clearly bimodal: a first peak stands for high density 
regions which are ionized 1 — x ^ 0.001 while a second 
population has 1 — x ^ 1. It indicates that some gaseous 
regions are sufficiently embedded and/or recombine fast 
enough to be "spared" by the radiation field. Again the 
Jo model is completely off the mean 1 — xinn) trend for 
high density regions uh > 0.001 cm~^, even though this 
level of radiation is effectively the one measured when av- 
eraging over the whole volume. Meanwhile the Ji does 
a better job and J4 appears to be a good match which 
is not surprising since they are better fits to the local 
density dependance of the UV background. Because of 
self-shielding, it is clear that the ionized state of high 
density regions cannot be deduced by the simple gener- 
alization of the average UV field found in the simulated 
volume. 

3.2.3. Comparison to observational constraints 

We first consider the evolution of the volume and mass 
averaged neutral fraction, shown in Figure [T71 Compared 
to the experiments without subgrid clumping, the frac- 
tions are typically one order of magnitude larger when 
subgrid structures are modeled. At z = 5.9, the volume 



averaged neutral fractions are equal tox~5xl0~^ 
and the mass averaged to x ~ 3 x 10~^. The differ- 
ences between the low and high normalization models for 
the clumping are small with higher neutral fraction for 
the high normalization model, which is expected since it 
overestimates the recombination rate in the most neutral 
regions. These levels of neutral gas a re consiste nt with 
observational constraints provided by iFan et ahl (j2006l lFl 
and for the two types of averaging and indicates that high 
resolution or subgrid clumping are required in order to 
match the data. For instance, the same agreement has al- 
ready been obtained bv Trac fc CenI ()2007[ ) using directly 
the particles as a source of clumping in a high resolu tion 
pure D M simulation. Other examples are Kohler et al.l 
()2007[ ) using clumping factors and Gnedin fc Fan (2006*) 
using small boxes calculations for the volume weighted 
neutral fraction. 

The agreement found in the current work is encourag- 
ing but should nevertheless considered with some care. 
First, the clumping models are simple and lack a de- 
tailed dependance on e.g. the temperature or UV field. 
But even with a more accurate description of the sub- 
cell physics, clumping factors will remain as a trick to 
cope with inadequate resolution and the forthcoming ef- 
fort should concentrate on improving the resolution using 
larger simulations. Furthermore, we still lack the cou- 
pling with the hydrodynamics which is likely to affect 
to small scale features of reionization and such physics 
cannot be assessed using only a subgrid clumping model. 
In the end, we estimate that given the simplistic aspect 
of our clumping model, the current agreement should be 
seen as a sign that the overall direction is correct, but 
it is not clear that improving the model is worth the 
effort, comparing to inc reasing the resolution of the sim- 
ulation. Second, iTrac fc Cen (2007) already noted that 
the constraints provided by Fan et al.l (j2006f ) also depend 
on some rather strong assumptions like the strict unifor- 
mity of the UV background or a smooth gas distribution. 
In the most pessimistic case, the present agreement can 
be fortuitous, even though an agreement on both the 
volume and mass averaged neutral fraction is unlikely to 
happen by accident. 

It should however be noted that the average neutral 
fractions do not correspond to any feature in the prob- 
ability density distributions. In Figure [TH we overlay 
the evolution of the averaged neutral fractions on the 
evolving distributions of probabilities. Red regions stand 
for high probabilities while blue ones stand for lower 
probabilities : after the overlap, the distributions are 
clearly dominated by strongly ionized regions {x ~ 10 
which correspond mostly to low gas densities. The mass- 
weighted distribution enhance the contribution of dense 
cells and which in turn push the average neutral fraction 
toward higher values {x ~ 10 ~^). However, the values of 
the average neutral fractions are never coincident with 
the maximum of the distribution. On the contrary these 
averages lie in the transition region between low neu- 
tral fraction and high neutral fraction regions. In other 
words, the average values lie in-between two characteris- 

^ The observational constraints shown here w ere recomputed 
from the transmission tables of Fan et al.l (|2006l ) using the same 
cosmology as the one used in our calculations. It results in a rela- 
tive variation of 20%. 
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Fig. 15. — Density dependance of the ionizing intensity in the 100 Mpc/h box with subgrid clumping. The red dashed hne shows 
our constant ionizing background model Jo, the dashed black line our J21 = Jo exp(— n/z/nj) model and the dotted black line our 
J21 = Jo exp(— model. The white lines show the average intensity per density bin. The top row results were obtained assuming 
the clumping law with high normalization and the bottom one with low normalization. 




Fig. 16. — Density dependance of the neutral fraction in the 100 Mpc/h box with subgrid clumping. The lines show the neutral fraction 
for our three ionizing background models, assuming photoionization equilibrium and a clumping factor model. The white lines show the 
average neutral fraction per density bin. The top row results were obtained assuming the clumping law with high normalization and the 
bottom one with low normalization. 



tic regions of the gas distribution, but is representative 
of neither of them and consequently is not a good proxy 
of the physical states that coexist inside the simulation. 

In Figure [181 we present the evolution of the distribu- 
tion of the UV background in the 100 Mpc/h boxes with 
subgrid clumping with the mean value super iniposed and 
the constraint provided by iBolton fc HaehneTtI (|2007[ ) af- 
ter the overlap. Compared to the same calculation with- 
out clumping, some improvement can be seen and the 
intensity of the UV background has been reduced by a 
factor 2 or 3 (depending on the kind of normalization 
applied to the recombination clumping). It can be noted 
that the same ratio was already observed between the 100 
Mpc/h and the 12.5 Mpc/h (which served to calibrate 
our clumping model) boxes in our fiducial calculations. 
Still, the discrepancy with the observational constraints 
remains quite large, almost one order of magnitude. Fur- 
thermore, the inspection of the distribution indicates 
that the mean value of the UV background effectively 



tracks the maximum of the J21 distribution. Therefore, 
no bias or multimodal distribution can be invoked as a 
valid reason for the discrepancy. 

This failure can be explained by the fact that very dif- 
ferent regions are at the origin of the average value for 
the neutral fraction and for the UV background values. 
After the overlap, the neutral fraction is intrinsically low 
in low density regions (1 — x ~ 10~^ — 10~^) and few 
regions with high neutral fraction push the average to 
higher values : at face value a single fully neutral cell 
weighs as much as 10^ cells with a 10~^ neutral frac- 
tion. Furthermore, if mass-weighting is considered, the 
impact of such cells is even higher since they are usu- 
ally more neutral. As a consequence, the average neutral 
fraction is pushed toward values higher than the peak 
of the distribution and dense cells (even less numerous) 
have an important impact. Considering now the average 
UV background, dense cells lie in regions where the ra- 
diation intensity is typically one thousand times smaller 
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than the typical value computed in low density regions 
because of self-shielding, which implies a smaller impact 
on the average J21. Consequently, the mean neutral frac- 
tion is not related to the mean UV background : the 
former is influenced by dense clumps while the latter is 
mainly set by voids. The fact that dense regions do not 
lie in the typical UV background explains our ability to 
reproduce the neutral fraction and our failure to satisfy 
the constraints on the ionizing radiation field. 

One may therefore ask how do we balance a discrep- 
ancy in the photoionization rate and an agreement in 
neutral fraction? First let us recall that the actual quan- 
tity measured in quasar spectra is the transmission T, 
i.e. the ratio of the observed fiux to the unobscured one 
over a given range of redshifts (see e.g. iFan et al.l[2QQl 
[2006). At the redshift considered here, the equivalent 
comoving distances are of the order of 60 Mpc/h and are 
therefore comparable to the experiments presented in the 
current work. Hence, observations gives a constraint on 

(T) - J p(A)e-"^(")^'/^^iA, (16) 

where A stands for the density contrast, a is the 
recombination rate (mostly homogeneous), Q(z) de- 
pends on the physical parameters of the Ly-a trans- 
mission and cosmology and finally p{A) is the PDF 
of the d ensity. A typical example o f such a PDF is 
given by iMira lda-Escude et al.' (2000') which i s use d in 
the models of Fan et al. (2002), Fan et al.l (j2006f ) or 
iBolton & Haehnelt (2007). In Eq. [HI the photoioniza- 
tion rate can be deduced from (T) and assuming pho- 
toionization equilibrium the neutral fraction can also be 
deduced. We have seen that the latter assumption is 
mostly verified. The relation also assumes that the Uni- 
verse is mostly ionized and that the UV background is 
homogeneous. From the expression in Equation[T6l it can 
easily be seen that the exponential cutoff implies that the 
actual density distribution at high density has no infiu- 
ence on the observed quantity, therefore their departure 
from homogeneity and low neutral fraction (measured in 
simulations) does not impact on the transmissions. Con- 
versely, it implies that the quantities which are mostly 
constrained are inferred fro m low-density reg ions (for a 
detailed discussion see e.e;. lOh Furlanettol 12005). In 
other words the photoionisation rate is more 'reliable' or 
more directly constrained than the neutral fraction and 
should be reproduced first by simulations : in principle 
the agreement on the neutral fraction should follow. In 
the current work, we show however that reproducing first 
the neutral fraction does not automatically imply an ad- 
equate photoionisation rate. 

3.2.4. Improving the model 

Which path should be taken toward a complete agree- 
ment between observational constraints and our calcu- 
lations? The most obvious free parameter we have ac- 
cess to is the escape fraction. We present in Figure [20] 
the evolution of the averaged UV background and neu- 
tral fraction for various escape fractions and using the 
same 1024^ particles 100 Mpc/h simulation with the 
high normalization clumping facto r model. P lotted along 
are the c onstraints provided by I Fan et aTl ([2006) and 
iBolton fc Haehnelt. (2007.) . As expected, lowering the es- 



cape fraction make the simulated UV background more 
consistent with observations. However, also as expected, 
the redshift of reionization decreases and for the lowest 
value of /esc = 2.5% presented here, overlap is not com- 
plete and the average neutral fraction is only at 5% at 
z = 6. Such a scenario is problematic, because it implies 
that the neutral fraction must decrease very sharply : ob- 
servational data exhibit some transmission for quasars at 
redshifts z ~ 5.9, i.e. at levels of neutral fraction close to 
1 — 0.0001 and it would imply a sudden decrease from 
X ^ 0.1 to 10~^ in a small redshift interval of Az ^0.1. 
Furthermore such a trend would also go against a bet- 
ter agreement with the optical depth measured from the 
CMB data which favor a higher escape fraction. One op- 
tion would be to use an evolving escape fraction, from 
~ 10% at z ~ 10 fown to /esc ~ 1% at z ~ 6. Prelimi- 
nary experiments (not shown here) indicate that, albeit 
helpful, this option does not easily provide a solution to 
the discrepancy. Furthermore, even if a good match is 
obtained, it would only consist in a proof of concept and 
one would have to relate this evolution to a physical pro- 
cess (like e.g. star formation). Other routes can be used 
to reduce this discrepancy. The lack of multi-frequency 
transfer implies among other things that no preheating 
occur behind ionization fronts. In particular, it would 
reduce both the recombination rate of the gas and the re- 
quired number of photons per baryons to complete reion- 
ization. Finally, the proper coupling of radiative transfer 
with hydrodynamics may prove to be crucial : low den- 
sity regions or minihaloes are likely to react to any kind of 
heating due to radiative transfer and the sour ce produc- 
tion (namely star formation) may be affected (jlliev et al.l 
I2OO5L [2007). even though self-shielding, which has been 
shown to be quite effective in our calculations, could go in 
the opposite direction. These points will be investigated 
in future work. 
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Fig. 17. — Evolution of the mass and volume averaged neutral 
fraction in the 100 Mpc/h box with a clumping factor assuming a 
high/low normalization (thin/thick line s). The values at z = 6 are 
consistent with measurements made bv lFan et al.l (|2006l ) for both 
kind of average method (dots). 

4. SUMMARY AND PROSPECTS 

We have presented a set of radiative cosmological sim- 
ulations in order to model the reionization epoch from 
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Fig. 18. — Evolution of the average ionizing radiation (blue 
line) in the 100 Mpc/h box assuming a subgrid clumping factor 
model with a low/high normalization(top/bottom panel). The col- 
ored isocontours stand for the distribution of J21 values at each 
redshift with high probability densities in red and low ones in 
blue. Th e marker at z ~ 6 shows the constraint provided by 
[Bolton &: HaehnelU (l2007i ). 

2: ^ 18 down to 2; ^ 6. The gas and dark matter dynam- 
ics, as well as the associated star formation have been 
performed with the RAMSES code, while radiative trans- 
fer has been computed by means of a moment-based for- 
malism using the Ml closure relation, implemented in the 
ATON code. The latter has been ported on a multi-GPU 
architecture using CUDA, providing an acceleration close 
to lOOx, which allows us to tackle radiative transfer prob- 
lems at high resolution (a 1024^ base grid and 2 levels of 
refinement for the hydrodynamics and a 1024^ Cartesian 
grid for the radiative transfer). 

A good level of convergence on average quantities 
(neutral fraction, UV background and Thomson optical 
depth) has been observed between different simulations 
of increasing mass and spatial resolution, as long as the 
effect of finite mass resolution on the simulated star for- 
mation history are properly taken in account. We have 
also shown that the density dependance of the neutral 
fraction is close to the one predicted by photo-ionization 
equilibrium, as long as the effect of self-shielding are con- 
sidered when defining the properties of the UV field. It 



also appears that without any other ingredients, our sim- 
ulation fails at reproducing the z ~ 6 constraints on 
the neutral fraction of hydrogen and the i ntensity of the 
UV ba ckground, in a similar manner to iFinlator et al.l 
(|2009bD . 

By combining our best resolved simulation (12.5 
Mpc/h and 1024^ particles) with our largest simulated 
volume (100 Mpc/h with 1024^ particles), we have intro- 
duced a subgrid clumping model in our chemistry solver, 
consistent with the one derived by e.g. IKohler et all 
(|2007). We have shown that, although this clumping 
factor model is quite simplistic, its allowed us to repro- 
duce the level of neutral gas deduced fr om the spectra 
of high re dshift quasar s, as did previouslv lGnedin fc FanI 
(1^06) or iTrac h CenI (2007) among others. However, 
our estimation of the average photoionization rate is still 
at least a factor of 2 above the observational constraints. 
This "semi-success" can be explained by the fact that 
the average radiation intensity and the average neutral 
fraction depend on different regions of the gas distribu- 
tion and one cannot simply deduce one from the other 
using photoionization equilibrium : in other words, if one 
constraint is satisfied, the other can't be. However, we 
have argued that the photoionization rate is probably 
a more robust observational constraint than the neutral 
fraction. This suggests that some effort should still be 
done in our modelisation to reproduce the level of the 
UV background at z ~ 6. 

Among several prospects, one obviously think of in- 
creasing the resolution of the calculations. With GPU 
acceleration 2048^ hydro+ radiative transfer calculations 
are within reach. However, it clearly appears that cou- 
pled hydrodynamics and radiative transfer simulations 
are necessary at this stage, since an increase in resolu- 
tion will inevitably rise the question of the impact of ra- 
diation on mini-haloes or on the star formation history. 
Also additional physics should be implemented, such as 
multi-group radiative transfer, where the importance of 
preheati ng by X-ra y s cou l d the refore be fully assessed 
(see e.g. iFurlanettol (j2006r ): [Sh ull fc Venkate san (2008)). 
but also population HI stars (jTrac fc CenI ((2Q07t )) and 
varying star formati o n efficiencies and es cape fractions 
(jCnedin et al.l (|2008h : IWise fc CenI (|2009l )). Overah, on 
a final positive note, our current results indicate a satis- 
fying trend of cosmological calculations toward satisfying 
observational constraints. 
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Fig. 19. — Evolution of the neutral fraction distribution with redshift along with the evolution of the average value using mass weighting 
(bottom row) and volume weighting (top row) for the 100 Mpc/h box with clumping. The left column stand for experiments with high 
normalization clumping and the right one for the low normalization clumping model. Contours show the density probabilities of neutral 
fraction with high probability densities in red and low ones in blue. 
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APPENDIX 
ON THE GPU IMPLEMENTATION OF ATON 

This section comment in further details the implementation of ATON on Graphics Processing Units (GPU). The whole 
development has been performed using the version 2.2 of the CUD A extension to the C language, developed by Nvidia 
for its graphics devices. However this section should be seen as a commentary of the implementation process rather 
than a full description of the programming details : the field of GPU programming is currently in full expansion, 
several standards/programming languages are competing with each other and many specific programming details are 
likely to be quickly outdated. For these reasons we choose to stick fairly general techniques, comment the suitability 
of the calculations to multithreaded calculations and provide the general tricks of our development to optimize the 
performances. 

Le us first recall that GPU computing relies on two separate hierarchies: a hierarchy of memories and a hierarchy 
of tasks. Regarding the first aspect and because GPUs are devices physically separated from the host, they possess 
their own memory known as the video memory or 'global memory' in the CUDA nomenclature. The transfer rate 
between the host and the GPU is therefore strongly limited by a bus and the best performances can be achieved if all 
the calculations are performed on the device, i.e. without transfer between the host and the GPU. An ideal situation 
would be to transfer the initial conditions (ICs) on the device and let it process the calculation on its own with the 
host acting merely as a driver of the calculation. This constraint is satisfied by the GPU implementation of ATON : the 
host sends signals to the GPU in a regular fashion to advance the simulation within a time step and from one time 
step to another but it never actually computes anything on the data. More precisely, the ICs are sent on the GPU, 
then the host asks to the GPU to compute the radiation transport, then the chemistry and the cooling. Then the 
same signals are sent again to the GPUs to perform the next time step until the simulation is completed. If required, 
the data are transfered back on the host to write the snapshot on the disk but such situation occurs only once in a 
while (typically once every 5000 time steps in our case). In such a procedure, only the host is aware of the fact that 
a time evolving calculation is performed but only the GPU does actual calculations. Furthermore, the global memory 
can be as big as 4GB on current devices and is used to store the data (like e.g. a large grid of values). This memory 
space is usually sufficient but slow in access. On-chip memory also exist, with fast access, but is usually small (of 
the order of 16 kB) and more importantly, still requires an access to the slow video memory to be filled. Therefore if 
possible, any memory access should lead to a significant 'number crunching' in order to make these memory transfers 
worthwhile. ATON fulfills this requirement quite easily since every time a cell is accessed (containing an energy density, 
fiux, temperature, ionization fraction and baryon density), the cell is fully updated and requires a transport calculation 
and the resolution of the ionization and energy balance equations which are quite demanding in terms of arithmetic 
intensity. 

Regarding the hierarchy of tasks, GPUs are eflicient in performing tasks (or execution threads) in parallels which 
are: 

• independent. If a given thread has to wait for the completion of another one to perform its calculation (e.g. 
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in a naturally sequential algorithm like a reduction iSengupta et al.l (|2QQ7f)) or i f two threads try to update 
simultaneously the same value (in e.g. histogramming calculations, I Aubert et al.l ()2QQ9f )). specific algorithmic 
techniques must be employed to keep the parallelism efficient. On the other hand, if calculations do not interfere 
with each other then porting these tasks on gpu architecture is usually quite easy. 

• predictable. Tasks can be unpredictable in their operations (if-else branches) or in their memory accesses. The 
former lead to divergence between threads where their execution tracks are executed sequentially by the CPUs 
thus reducing the efficiency of parallelism. If divergences are limited to exceptions (i.e. have a small chance to 
happen) and are hidden in intensive calculations their impact remain small. The latter lead to non coalescent 
and non aligned memory accesses which greatly impact the performances. 

• compact. A task is compact if it uploads data in a compact region in memory. Again, compact calculation leads 
to coalescent memory accesses which greatly improves the acceleration of the calculation. 

It turns out that ATON possesses these three qualities. To demonstrate it, let us first recall that the radiation transport 
equations can be written in a generic manner as: 

du ^ dF(u) ^ ^ ^^^^ 
dt dx ' 

where is a set of conserved variables (energy density and fiux in the case of radiative transfer), F{u) the associated 
fiux, and 5 is a generic source term. We considered a ID transport for simplicity. It translates into: 



At Ax 



Sf, (A2) 



when one considers an explicit finite difference (FD) scheme in order to update u at position i at time p-\- 1. Usually 
the intercell ffux can be exactly solved or approximated using the values in the neighboring cells through an operator 
g and for instance: 

Fz+i/2=g{hi^l/2). (A3) 

Moreover the chemistry /temperature updates plus the effect of absorption can in our implementation be formally 
written as 

K+\x?+\rf+i) = $K+\<,rf), (A4) 

where T and x stand for the temperature and ionized fractions while u^~^^ stand for the conserved variables updated 
after the transport. As one deals with grid-based structures it is natural to assign a thread to the update of a specific 
cell. From there it can be seen that ATON is well suited for GPU parallelism according to the three qualities listed 
before : 

• independence : all these calculations are explicit : the only intermediate results needed are the transport-updated 
u^~^^ and it is a local value. All the other inputs are initial state values which do not require communications 
during the calculation per se. As a consequence all the cell updates (and therefore the threads) are independent 
and the overall procedure is free of threads collisions or sequential calculations where one thread has to wait the 
completion of one or several other tasks. 

• predictability : here the calculations are at least 'memory-predictable'. Updating a given cell requires data in a 
region which is known by advance, i.e. the updated cell plus its 6 neighbors in 3D. Operation branching occurs 
in the cooling and chemistry calculations and has some impact on the performance (see the the subsequent 
analysis). 

• compacity : again the calculation requires a 7 cells stencil for a single thread which is quite compact and allows 
to enforce the coalescent memory accesses, as shown hereafter. 

Finally, these devices can easily be used at full power if two properties are satisfied during the calculation: the data 
in global memory should be accessed in a coalescent and aligned fashion. Figure [2T] allows us to explain the coalescence 
in details assuming a 2D calculation. The data is accessed in a coalescent fashion if a serie of threads reads data which 
are organized in a sequential manner in the memory. In Figure ETJ a 2D field is physically stored in memory as a ID 
sequence listed by numbers 1 to 25. If a sequence of threads (shown colored) is set in such a way that threads access 
the data 'vertically' (left scheme), they physically access data which are separated by jumps of 5 units : such a strategy 
is non coalescent. Conversely if the sequence of threads is organized 'horizontally' (bold border in the right scheme), 
they access to data which are physically next to each other, i.e. in a coalescent fashion. All the threads in ATON are 
arranged following this strategy in order to enforce coalescence. For example the chemistry /cooling step is performed 
with threads along the physically coalescent direction. The radiation transport step is slightly more difficult to set up 
as it involves a finite difference along all the directions: 
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For the finite difference performed along tiie coalescent direction, the coalescence is naturally satisfied. In order to 
avoid multiple access to the same data by neighboring threads, the coalescent values are uploaded in shared (on chip) 
memory once and calculations are performed from this shared memory. For the finite difference performed along the 
non-coalesced direction (vertical in Figure [21]), a naive strategy would have been to upload the vertical values in shared 
memory (left column), i.e. along the direction of the finite difference. However this would imply non coalescent access. 
The correct way to deal with this finite difference is shown on the right panel of Figure [2T1 First the threads should 
be organized along the coalescent direction (shown colored). Then all threads upload the data 'above' the region to 
update (shown with a bold line) in shared memory along the coalescent direction. The same is done for the data 
'below' the region to update. Finally the finite difference can be performed. From our experience, switching from non 
coalescent to fully coalescent strategies can improve the performance of the GPU calculation by factor of 10 to 100. 
It should be said that such a 'trick' is not specific to CPU-based calculation but given the high parallelization of the 
devices such an optimization has a more dramatic impact on their performances compared to usual scalar processors. 

Aligned access is more specifically related to the hardware used. Typically, the data should be accessed in sets of 
64, 96 or 128 words which is usually satisfied using thread configurations which rely on powers of two. A additional 
constraint is that the range of memories accessed by these sets should be aligned with 'preferential' memory addresses, 
usually multiples of 16. When dealing with arrays with dimensions equals to powers of two, any access of sets of 64, 
96 or 128 words will be automatically aligned. Non aligned access will result in multiple memory queries on aligned 
addresses in place of a single one. It turns out that such situation is quite common as boundary conditions usually add a 
layer of data around the actual computational volume making e.g. a 128 x 128 x 128 cube a (128+2) x (128+2) x (128+2) 
cube, which breaks the alignment. We circumvent this by making the boundary layer larger than required by the code 
since we are not limited by memory (e.g. 128 x 128 x 128 cube becomes a (128 + 32) x (128 + 32) x (128 + 32) cube). 
Typically an additional factor of 2-3 of acceleration can be achieved by enforcing alignment. 

As an illustration of the computing abilities of GPUs, we show in Fig. [22] the average duration of a time step of 
a radiative transfer post-processing performed on the cosmological test of the comparison project. The same test 
has been performed at several resolution and executed on a Opteron 2.2 GHz and a GeForce 8800 GTX, which are 
comparable in terms of generation (2005-2006). A significant acceleration close to 80 was observed on GPU compared 
to a monocore run on CPU. Both calculations were performed using single float precision and no difference could be 
seen between the calculations at such precision. 
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Fig. 21. — Comparison of two finite differences (FD) strategies on a 2D field in memory. The sequence indicates the actual organization of 
cells in memory, the coalescent direction. We consider the case where the FD should be performed along the non-coalescent direction. Each 
color represent the location to be computed by a thread. Left : 'vertical strategy' where the threads are arranged along the FD direction. 
Right : 'horizontal Strategy' where the threads are arranged perpendicular to the FD direction. The 'horizontal strategy' maximizes the 
performance of the GPUs due to coalescent memory accesses. See main text for details. 

ATON is able to run on configurations with multiple graphical devices. It implies that GPUs should communicate in 
order to exchange boundary conditions. This is simply done by adding an MPI layer over the GPU inner parallelization. 
Once a GPU has updated its subgrid the following sequence happens at each time step: 

• A GPU-buffer is created on each GPU to gather the data to be passed at each time step (here namely the 
radiative energy and fluxes) 
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Fig. 22. — Average time steps duration for the cosmological test of the comparison project for CPU and GPU, at different resolutions. 
The GPU and the CPU are roughly equivalent in terms of generation. 

• this GPU-buffer is transmitted to the host into a CPU-buffer 

• the CPU-buffers are exchanged using regular MPI-based instructions 

• the updated CPU-buffers are transmitted to the CPUs into the GPU-buffer 

• the data inside the buffer are distributed into the correct radiative variables. 

Considering the coalescence and alignment constraints depicted above, the natural parallelization for multi-gpu cal- 
culations is 'slab-based' and for example a 512x512x512 calculation would be divided in 4 calculations 512x512x128 
on 4 CPUs. The reason is that at each time step, the radiative energy and fluxes at the boundaries should be passed 
to the neighboring GPU by collecting them into a buffer and a slab-based configuration implies that the collection 
is performed in a coalescent manner (and the distribution as well). However we choose to stick to sub-cube based 
parallel configuration in the prospect of coupling with N-Body+hydro integrators (such as RAMSES) which parallel 
configuration is closer to 'sub-cube' segmentation than 'slab-based' ones. Furthermore, the slab-based decomposition 
cannot be naively applied for large problems because of hardware limitation such as the amount of memory per kernel 
(16KB) or the number of threads per block (512). Conversely, it implies that non-coalescent access are performed 
during the gathering/distribution phases (see Fig. [23|) . Finally it should be noted that such communications require 
systematic transfers between the hosts and the CPUs through the PCI bus, which act therefore as a bottleneck in 
the communications. In Fig. [24] we present the average duration of the time steps for several multi-gpu configuration 
and problem sizes and the acceleration as a function of the number of CPUs. Even though the implemented paral- 
lelization is simple, the speedup trends is quite optimal and the amount of time spent into the communication remains 
reasonable at levels of 10-15%. This number is quite important by standards of parallel high performance computing 
but compared to an initial acceleration of a few tens (compared to the CPU), this overhead remains small enough to 
assess large problems. 
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Fig. 23. — Two communication strategies for multi-gpu calculations. The coalescent direction is assumed to be the horizontal one. 
Top : slab-based communication. Bottom : cube-based communication. For the cube-based decomposition, some communications involve 
gathering and dispatching data in a non coalescent manner. The cube-based technique has nevertheless been chosen for ATON to assess large 
problems, to reduce the shared memory usage and in the prospect of coupling ATON to integrators with cube-based decomposition. 
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Fig. 24. — Timings of multi-GPU calculations. Left: average duration of a time step for a typical cosmological field used in this work and 
for several parallel configurations. The first integer stand for the total number of cells along one direction and the second stands for the 
number of GPUs. For example 512-8 means a 512^ radiative transfer calculation distributed over 8 GPUs. Right: Acceleration as a factor 
of the number of GPUs compared to a mono-GPU calculation, the dot stand for the actual measurement while the straight line stand for 
the perfect acceleration trend. Measurements were performed on the Titane supercomputer (CCRT-CEA) using Tesla C1060 GPUs. 



